Overview

Dataset statistics

Number of variables18
Number of observations8950
Missing cells314
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.7 MiB
Average record size in memory199.0 B

Variable types

NUM17
CAT1

Reproduction

Analysis started2020-03-27 11:52:11.108598
Analysis finished2020-03-27 11:52:59.520269
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
CUST_ID has a high cardinality: 8950 distinct values High cardinality
ONEOFF_PURCHASES is highly correlated with PURCHASESHigh Correlation
PURCHASES is highly correlated with ONEOFF_PURCHASESHigh Correlation
MINIMUM_PAYMENTS has 313 (3.5%) missing values Missing
PURCHASES has 2044 (22.8%) zeros Zeros
ONEOFF_PURCHASES has 4302 (48.1%) zeros Zeros
INSTALLMENTS_PURCHASES has 3916 (43.8%) zeros Zeros
CASH_ADVANCE has 4628 (51.7%) zeros Zeros
PURCHASES_FREQUENCY has 2043 (22.8%) zeros Zeros
ONEOFF_PURCHASES_FREQUENCY has 4302 (48.1%) zeros Zeros
PURCHASES_INSTALLMENTS_FREQUENCY has 3915 (43.7%) zeros Zeros
CASH_ADVANCE_FREQUENCY has 4628 (51.7%) zeros Zeros
CASH_ADVANCE_TRX has 4628 (51.7%) zeros Zeros
PURCHASES_TRX has 2044 (22.8%) zeros Zeros
PAYMENTS has 240 (2.7%) zeros Zeros
PRC_FULL_PAYMENT has 5903 (66.0%) zeros Zeros

Variables

CUST_ID
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE
Distinct count8950
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size70.0 KiB
C16757
 
1
C10801
 
1
C13448
 
1
C18832
 
1
C12776
 
1
Other values (8945)
8945
ValueCountFrequency (%) 
C16757 1 < 0.1%
 
C10801 1 < 0.1%
 
C13448 1 < 0.1%
 
C18832 1 < 0.1%
 
C12776 1 < 0.1%
 
C17275 1 < 0.1%
 
C13437 1 < 0.1%
 
C11875 1 < 0.1%
 
C17487 1 < 0.1%
 
C12843 1 < 0.1%
 
Other values (8940) 8940 99.9%
 

Length

Max length6
Mean length6
Min length6
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Uppercase_Letter 1 9.1%
 
ValueCountFrequency (%) 
Common 10 90.9%
 
Latin 1 9.1%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

BALANCE
Real number (ℝ≥0)

Distinct count8871
Unique (%)99.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1564.474828
Minimum0
Maximum19043.13856
Zeros80
Zeros (%)0.9%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile8.81451835
Q1128.2819155
median873.385231
Q32054.140036
95-th percentile5909.111808
Maximum19043.13856
Range19043.13856
Interquartile range (IQR)1925.85812

Descriptive statistics

Standard deviation2081.531879
Coefficient of variation (CV)1.330498799
Kurtosis7.6747513
Mean1564.474828
Median Absolute Deviation (MAD)1459.747302
Skewness2.393386043
Sum14002049.71
Variance4332774.965
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000000e+00 9.95000000e-05 8.16750000e-03 8.84690150e+00 2.89368260e+01 ... 6.07344107e+03 8.11840880e+03 9.65527800e+03 1.25372974e+04 1.90431386e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 80 0.9%
 
1100.941072 1 < 0.1%
 
40.074484 1 < 0.1%
 
2093.844656 1 < 0.1%
 
179.765708 1 < 0.1%
 
12.654903 1 < 0.1%
 
1893.704851 1 < 0.1%
 
1571.218695 1 < 0.1%
 
31.285608 1 < 0.1%
 
1772.323491 1 < 0.1%
 
Other values (8861) 8861 99.0%
 
ValueCountFrequency (%) 
0 80 0.9%
 
0.000199 1 < 0.1%
 
0.001146 1 < 0.1%
 
0.001214 1 < 0.1%
 
0.001289 1 < 0.1%
 
ValueCountFrequency (%) 
19043.13856 1 < 0.1%
 
18495.55855 1 < 0.1%
 
16304.88925 1 < 0.1%
 
16259.44857 1 < 0.1%
 
16115.5964 1 < 0.1%
 

BALANCE_FREQUENCY
Real number (ℝ≥0)

Distinct count43
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8772707256
Minimum0
Maximum1
Zeros80
Zeros (%)0.9%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile0.272727
Q10.888889
median1
Q31
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0.111111

Descriptive statistics

Standard deviation0.2369040027
Coefficient of variation (CV)0.2700466296
Kurtosis3.092369622
Mean0.8772707256
Median Absolute Deviation (MAD)0.1736723384
Skewness-2.023265519
Sum7851.572994
Variance0.05612350649
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.0954545 0.1742425 0.190909 0.2613635 ... 0.8257575 0.8819445 0.9045455 0.9545455 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 6211 69.4%
 
0.909091 410 4.6%
 
0.818182 278 3.1%
 
0.727273 223 2.5%
 
0.545455 219 2.4%
 
0.636364 209 2.3%
 
0.454545 172 1.9%
 
0.363636 170 1.9%
 
0.272727 151 1.7%
 
0.181818 146 1.6%
 
Other values (33) 761 8.5%
 
ValueCountFrequency (%) 
0 80 0.9%
 
0.090909 67 0.7%
 
0.1 8 0.1%
 
0.111111 5 0.1%
 
0.125 9 0.1%
 
ValueCountFrequency (%) 
1 6211 69.4%
 
0.909091 410 4.6%
 
0.9 55 0.6%
 
0.888889 53 0.6%
 
0.875 57 0.6%
 

PURCHASES
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS
Distinct count6203
Unique (%)69.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1003.204834
Minimum0
Maximum49039.57
Zeros2044
Zeros (%)22.8%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q139.635
median361.28
Q31110.13
95-th percentile3998.6195
Maximum49039.57
Range49039.57
Interquartile range (IQR)1070.495

Descriptive statistics

Standard deviation2136.634782
Coefficient of variation (CV)2.129809098
Kurtosis111.3887709
Mean1003.204834
Median Absolute Deviation (MAD)1079.796087
Skewness8.144269065
Sum8978683.26
Variance4565208.191
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.0000000e+00 5.0000000e-03 3.0000000e-02 1.9890000e+01 2.0045000e+01 ... 6.0185650e+03 8.8577750e+03 1.2717255e+04 2.7874050e+04 4.9039570e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 2044 22.8%
 
45.65 27 0.3%
 
150 16 0.2%
 
60 16 0.2%
 
100 13 0.1%
 
300 13 0.1%
 
200 13 0.1%
 
450 12 0.1%
 
600 10 0.1%
 
70 10 0.1%
 
Other values (6193) 6776 75.7%
 
ValueCountFrequency (%) 
0 2044 22.8%
 
0.01 4 < 0.1%
 
0.05 1 < 0.1%
 
0.24 1 < 0.1%
 
0.7 1 < 0.1%
 
ValueCountFrequency (%) 
49039.57 1 < 0.1%
 
41050.4 1 < 0.1%
 
40040.71 1 < 0.1%
 
38902.71 1 < 0.1%
 
35131.16 1 < 0.1%
 

ONEOFF_PURCHASES
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS
Distinct count4014
Unique (%)44.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean592.4373709
Minimum0
Maximum40761.25
Zeros4302
Zeros (%)48.1%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median38
Q3577.405
95-th percentile2671.094
Maximum40761.25
Range40761.25
Interquartile range (IQR)577.405

Descriptive statistics

Standard deviation1659.887917
Coefficient of variation (CV)2.801794753
Kurtosis164.187572
Mean592.4373709
Median Absolute Deviation (MAD)766.0949878
Skewness10.04508288
Sum5302314.47
Variance2755227.898
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.0000000e+00 5.0000000e-03 3.5000000e-02 1.9890000e+01 2.0045000e+01 ... 4.0719450e+03 5.0958750e+03 8.0312250e+03 1.2828535e+04 4.0761250e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 4302 48.1%
 
45.65 46 0.5%
 
50 17 0.2%
 
200 15 0.2%
 
60 13 0.1%
 
100 13 0.1%
 
70 12 0.1%
 
150 12 0.1%
 
1000 12 0.1%
 
250 11 0.1%
 
Other values (4004) 4497 50.2%
 
ValueCountFrequency (%) 
0 4302 48.1%
 
0.01 7 0.1%
 
0.02 2 < 0.1%
 
0.05 1 < 0.1%
 
0.24 1 < 0.1%
 
ValueCountFrequency (%) 
40761.25 1 < 0.1%
 
40624.06 1 < 0.1%
 
34087.73 1 < 0.1%
 
33803.84 1 < 0.1%
 
26547.43 1 < 0.1%
 

INSTALLMENTS_PURCHASES
Real number (ℝ≥0)

ZEROS
Distinct count4452
Unique (%)49.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean411.0676447
Minimum0
Maximum22500
Zeros3916
Zeros (%)43.8%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median89
Q3468.6375
95-th percentile1750.0875
Maximum22500
Range22500
Interquartile range (IQR)468.6375

Descriptive statistics

Standard deviation904.3381152
Coefficient of variation (CV)2.199973963
Kurtosis96.57517753
Mean411.0676447
Median Absolute Deviation (MAD)482.8530134
Skewness7.299119909
Sum3679055.42
Variance817827.4266
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 9.750000e-01 9.430000e+00 4.485000e+01 9.998000e+01 ... 3.207565e+03 4.290080e+03 7.058560e+03 1.296145e+04 2.250000e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3916 43.8%
 
100 14 0.2%
 
300 14 0.2%
 
200 14 0.2%
 
150 12 0.1%
 
125 11 0.1%
 
75 9 0.1%
 
225 8 0.1%
 
350 8 0.1%
 
450 8 0.1%
 
Other values (4442) 4936 55.2%
 
ValueCountFrequency (%) 
0 3916 43.8%
 
1.95 1 < 0.1%
 
4.44 1 < 0.1%
 
4.8 1 < 0.1%
 
6.33 1 < 0.1%
 
ValueCountFrequency (%) 
22500 1 < 0.1%
 
15497.19 1 < 0.1%
 
14686.1 1 < 0.1%
 
13184.43 1 < 0.1%
 
12738.47 1 < 0.1%
 

CASH_ADVANCE
Real number (ℝ≥0)

ZEROS
Distinct count4323
Unique (%)48.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean978.8711125
Minimum0
Maximum47137.21176
Zeros4628
Zeros (%)51.7%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31113.821139
95-th percentile4647.169122
Maximum47137.21176
Range47137.21176
Interquartile range (IQR)1113.821139

Descriptive statistics

Standard deviation2097.163877
Coefficient of variation (CV)2.142431062
Kurtosis52.89943411
Mean978.8711125
Median Absolute Deviation (MAD)1261.399944
Skewness5.166609074
Sum8760896.457
Variance4398096.325
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000000e+00 7.11110800e+00 1.80803675e+01 1.99628170e+01 3.61735065e+01 ... 8.00481417e+03 1.07546266e+04 1.52932549e+04 2.82892975e+04 4.71372118e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 4628 51.7%
 
1286.356207 1 < 0.1%
 
3816.470266 1 < 0.1%
 
2495.298926 1 < 0.1%
 
748.241727 1 < 0.1%
 
1572.492719 1 < 0.1%
 
5425.912807 1 < 0.1%
 
9553.955906 1 < 0.1%
 
341.360876 1 < 0.1%
 
1424.442602 1 < 0.1%
 
Other values (4313) 4313 48.2%
 
ValueCountFrequency (%) 
0 4628 51.7%
 
14.222216 1 < 0.1%
 
18.042768 1 < 0.1%
 
18.117967 1 < 0.1%
 
18.123413 1 < 0.1%
 
ValueCountFrequency (%) 
47137.21176 1 < 0.1%
 
29282.10915 1 < 0.1%
 
27296.48576 1 < 0.1%
 
26268.69989 1 < 0.1%
 
26194.04954 1 < 0.1%
 

PURCHASES_FREQUENCY
Real number (ℝ≥0)

ZEROS
Distinct count47
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4903505484
Minimum0
Maximum1
Zeros2043
Zeros (%)22.8%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.083333
median0.5
Q30.916667
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0.833334

Descriptive statistics

Standard deviation0.4013707474
Coefficient of variation (CV)0.8185383879
Kurtosis-1.638630948
Mean0.4903505484
Median Absolute Deviation (MAD)0.368256228
Skewness0.06016423586
Sum4388.637408
Variance0.1610984768
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.0416665 0.087121 0.0954545 0.154762 ... 0.845238 0.8944445 0.912879 0.9583335 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 2178 24.3%
 
0 2043 22.8%
 
0.083333 677 7.6%
 
0.916667 396 4.4%
 
0.5 395 4.4%
 
0.166667 392 4.4%
 
0.833333 373 4.2%
 
0.333333 367 4.1%
 
0.25 345 3.9%
 
0.583333 316 3.5%
 
Other values (37) 1468 16.4%
 
ValueCountFrequency (%) 
0 2043 22.8%
 
0.083333 677 7.6%
 
0.090909 43 0.5%
 
0.1 27 0.3%
 
0.111111 18 0.2%
 
ValueCountFrequency (%) 
1 2178 24.3%
 
0.916667 396 4.4%
 
0.909091 28 0.3%
 
0.9 24 0.3%
 
0.888889 18 0.2%
 

ONEOFF_PURCHASES_FREQUENCY
Real number (ℝ≥0)

ZEROS
Distinct count47
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2024576836
Minimum0
Maximum1
Zeros4302
Zeros (%)48.1%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.083333
Q30.3
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0.3

Descriptive statistics

Standard deviation0.2983360652
Coefficient of variation (CV)1.473572452
Kurtosis1.161845601
Mean0.2024576836
Median Absolute Deviation (MAD)0.2329477937
Skewness1.535612784
Sum1811.996268
Variance0.08900440779
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.0416665 0.087121 0.1055555 0.154762 ... 0.8257575 0.845238 0.912879 0.9583335 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 4302 48.1%
 
0.083333 1104 12.3%
 
0.166667 592 6.6%
 
1 481 5.4%
 
0.25 418 4.7%
 
0.333333 355 4.0%
 
0.416667 244 2.7%
 
0.5 235 2.6%
 
0.583333 197 2.2%
 
0.666667 167 1.9%
 
Other values (37) 855 9.6%
 
ValueCountFrequency (%) 
0 4302 48.1%
 
0.083333 1104 12.3%
 
0.090909 56 0.6%
 
0.1 39 0.4%
 
0.111111 26 0.3%
 
ValueCountFrequency (%) 
1 481 5.4%
 
0.916667 151 1.7%
 
0.909091 4 < 0.1%
 
0.9 1 < 0.1%
 
0.888889 2 < 0.1%
 

PURCHASES_INSTALLMENTS_FREQUENCY
Real number (ℝ≥0)

ZEROS
Distinct count47
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3644373416
Minimum0
Maximum1
Zeros3915
Zeros (%)43.7%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.166667
Q30.75
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0.75

Descriptive statistics

Standard deviation0.3974477797
Coefficient of variation (CV)1.090579187
Kurtosis-1.398632185
Mean0.3644373416
Median Absolute Deviation (MAD)0.361672884
Skewness0.509201165
Sum3261.714207
Variance0.1579647376
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.0416665 0.087121 0.1180555 0.154762 ... 0.8257575 0.845238 0.912879 0.9583335 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3915 43.7%
 
1 1331 14.9%
 
0.416667 388 4.3%
 
0.916667 345 3.9%
 
0.833333 311 3.5%
 
0.5 310 3.5%
 
0.166667 305 3.4%
 
0.666667 292 3.3%
 
0.75 291 3.3%
 
0.083333 275 3.1%
 
Other values (37) 1187 13.3%
 
ValueCountFrequency (%) 
0 3915 43.7%
 
0.083333 275 3.1%
 
0.090909 12 0.1%
 
0.1 6 0.1%
 
0.111111 9 0.1%
 
ValueCountFrequency (%) 
1 1331 14.9%
 
0.916667 345 3.9%
 
0.909091 25 0.3%
 
0.9 19 0.2%
 
0.888889 28 0.3%
 

CASH_ADVANCE_FREQUENCY
Real number (ℝ≥0)

ZEROS
Distinct count54
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1351442003
Minimum0
Maximum1.5
Zeros4628
Zeros (%)51.7%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.222222
95-th percentile0.583333
Maximum1.5
Range1.5
Interquartile range (IQR)0.222222

Descriptive statistics

Standard deviation0.2001213881
Coefficient of variation (CV)1.480798937
Kurtosis3.334734328
Mean0.1351442003
Median Absolute Deviation (MAD)0.1528463515
Skewness1.828686266
Sum1209.540593
Variance0.04004856999
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.0416665 0.087121 0.0954545 0.154762 ... 0.763889 0.8257575 0.845238 1.0454545 1.5 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 4628 51.7%
 
0.083333 1021 11.4%
 
0.166667 759 8.5%
 
0.25 578 6.5%
 
0.333333 439 4.9%
 
0.416667 273 3.1%
 
0.5 215 2.4%
 
0.583333 142 1.6%
 
0.666667 125 1.4%
 
0.090909 70 0.8%
 
Other values (44) 700 7.8%
 
ValueCountFrequency (%) 
0 4628 51.7%
 
0.083333 1021 11.4%
 
0.090909 70 0.8%
 
0.1 39 0.4%
 
0.111111 29 0.3%
 
ValueCountFrequency (%) 
1.5 1 < 0.1%
 
1.25 1 < 0.1%
 
1.166667 2 < 0.1%
 
1.142857 1 < 0.1%
 
1.125 1 < 0.1%
 

CASH_ADVANCE_TRX
Real number (ℝ≥0)

ZEROS
Distinct count65
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.248826816
Minimum0
Maximum123
Zeros4628
Zeros (%)51.7%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q34
95-th percentile15
Maximum123
Range123
Interquartile range (IQR)4

Descriptive statistics

Standard deviation6.824646744
Coefficient of variation (CV)2.100649598
Kurtosis61.64686248
Mean3.248826816
Median Absolute Deviation (MAD)4.002914191
Skewness5.721298203
Sum29077
Variance46.57580318
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 4.5 ... 17.5 24.5 31.5 52.5 123. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 4628 51.7%
 
1 887 9.9%
 
2 620 6.9%
 
3 436 4.9%
 
4 384 4.3%
 
5 308 3.4%
 
6 246 2.7%
 
7 205 2.3%
 
8 171 1.9%
 
10 150 1.7%
 
Other values (55) 915 10.2%
 
ValueCountFrequency (%) 
0 4628 51.7%
 
1 887 9.9%
 
2 620 6.9%
 
3 436 4.9%
 
4 384 4.3%
 
ValueCountFrequency (%) 
123 3 < 0.1%
 
110 1 < 0.1%
 
107 1 < 0.1%
 
93 1 < 0.1%
 
80 1 < 0.1%
 

PURCHASES_TRX
Real number (ℝ≥0)

ZEROS
Distinct count173
Unique (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.7098324
Minimum0
Maximum358
Zeros2044
Zeros (%)22.8%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median7
Q317
95-th percentile57
Maximum358
Range358
Interquartile range (IQR)16

Descriptive statistics

Standard deviation24.85764911
Coefficient of variation (CV)1.689866236
Kurtosis34.79310026
Mean14.7098324
Median Absolute Deviation (MAD)14.64307743
Skewness4.630655266
Sum131653
Variance617.9027193
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 8.5 ... 98.5 122.5 159.5 230.5 358. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 2044 22.8%
 
1 667 7.5%
 
12 570 6.4%
 
2 379 4.2%
 
6 352 3.9%
 
3 314 3.5%
 
4 285 3.2%
 
7 275 3.1%
 
8 267 3.0%
 
5 267 3.0%
 
Other values (163) 3530 39.4%
 
ValueCountFrequency (%) 
0 2044 22.8%
 
1 667 7.5%
 
2 379 4.2%
 
3 314 3.5%
 
4 285 3.2%
 
ValueCountFrequency (%) 
358 1 < 0.1%
 
347 1 < 0.1%
 
344 1 < 0.1%
 
309 1 < 0.1%
 
308 1 < 0.1%
 

CREDIT_LIMIT
Real number (ℝ≥0)

Distinct count205
Unique (%)2.3%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean4494.44945
Minimum50
Maximum30000
Zeros0
Zeros (%)0.0%
Memory size70.0 KiB

Quantile statistics

Minimum50
5-th percentile1000
Q11600
median3000
Q36500
95-th percentile12000
Maximum30000
Range29950
Interquartile range (IQR)4900

Descriptive statistics

Standard deviation3638.815725
Coefficient of variation (CV)0.8096243524
Kurtosis2.836655932
Mean4494.44945
Median Absolute Deviation (MAD)2839.031257
Skewness1.522464005
Sum40220828.13
Variance13240979.88
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3000 784 8.8%
 
1500 722 8.1%
 
1200 621 6.9%
 
1000 614 6.9%
 
2500 612 6.8%
 
4000 506 5.7%
 
6000 463 5.2%
 
5000 389 4.3%
 
2000 371 4.1%
 
7500 277 3.1%
 
Other values (195) 3590 40.1%
 
ValueCountFrequency (%) 
50 1 < 0.1%
 
150 5 0.1%
 
200 3 < 0.1%
 
300 14 0.2%
 
400 3 < 0.1%
 
ValueCountFrequency (%) 
30000 2 < 0.1%
 
28000 1 < 0.1%
 
25000 1 < 0.1%
 
23000 2 < 0.1%
 
22500 1 < 0.1%
 

PAYMENTS
Real number (ℝ≥0)

ZEROS
Distinct count8711
Unique (%)97.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1733.143852
Minimum0
Maximum50721.48336
Zeros240
Zeros (%)2.7%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile89.98892395
Q1383.276166
median856.901546
Q31901.134317
95-th percentile6082.090595
Maximum50721.48336
Range50721.48336
Interquartile range (IQR)1517.858151

Descriptive statistics

Standard deviation2895.063757
Coefficient of variation (CV)1.670411693
Kurtosis54.77073581
Mean1733.143852
Median Absolute Deviation (MAD)1553.741531
Skewness5.907619794
Sum15511637.48
Variance8381394.157
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000000e+00 2.47565000e-02 5.46557440e+01 1.49249077e+02 3.64735164e+02 ... 9.00345148e+03 1.17195868e+04 1.44720480e+04 2.30845738e+04 5.07214834e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 240 2.7%
 
806.587482 1 < 0.1%
 
836.812414 1 < 0.1%
 
139.607827 1 < 0.1%
 
107.242408 1 < 0.1%
 
433.860714 1 < 0.1%
 
402.20014 1 < 0.1%
 
308.886492 1 < 0.1%
 
238.695115 1 < 0.1%
 
197.86349 1 < 0.1%
 
Other values (8701) 8701 97.2%
 
ValueCountFrequency (%) 
0 240 2.7%
 
0.049513 1 < 0.1%
 
0.056466 1 < 0.1%
 
2.389583 1 < 0.1%
 
3.500505 1 < 0.1%
 
ValueCountFrequency (%) 
50721.48336 1 < 0.1%
 
46930.59824 1 < 0.1%
 
40627.59524 1 < 0.1%
 
39461.9658 1 < 0.1%
 
39048.59762 1 < 0.1%
 

MINIMUM_PAYMENTS
Real number (ℝ≥0)

MISSING
Distinct count8636
Unique (%)> 99.9%
Missing313
Missing (%)3.5%
Infinite0
Infinite (%)0.0%
Mean864.2065423
Minimum0.019163
Maximum76406.20752
Zeros0
Zeros (%)0.0%
Memory size70.0 KiB

Quantile statistics

Minimum0.019163
5-th percentile73.2820058
Q1169.123707
median312.343947
Q3825.485459
95-th percentile2766.56331
Maximum76406.20752
Range76406.18836
Interquartile range (IQR)656.361752

Descriptive statistics

Standard deviation2372.446607
Coefficient of variation (CV)2.745231019
Kurtosis283.9899859
Mean864.2065423
Median Absolute Deviation (MAD)869.0195729
Skewness13.62279699
Sum7464151.906
Variance5628502.901
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
299.351881 2 < 0.1%
 
284.70693 1 < 0.1%
 
144.641814 1 < 0.1%
 
1927.887547 1 < 0.1%
 
6825.44203 1 < 0.1%
 
342.412476 1 < 0.1%
 
5583.630482 1 < 0.1%
 
125.3494 1 < 0.1%
 
3.19794 1 < 0.1%
 
140.596138 1 < 0.1%
 
Other values (8626) 8626 96.4%
 
(Missing) 313 3.5%
 
ValueCountFrequency (%) 
0.019163 1 < 0.1%
 
0.037744 1 < 0.1%
 
0.05588 1 < 0.1%
 
0.059481 1 < 0.1%
 
0.117036 1 < 0.1%
 
ValueCountFrequency (%) 
76406.20752 1 < 0.1%
 
61031.6186 1 < 0.1%
 
56370.04117 1 < 0.1%
 
50260.75947 1 < 0.1%
 
43132.72823 1 < 0.1%
 

PRC_FULL_PAYMENT
Real number (ℝ≥0)

ZEROS
Distinct count47
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1537146485
Minimum0
Maximum1
Zeros5903
Zeros (%)66.0%
Memory size70.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.142857
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0.142857

Descriptive statistics

Standard deviation0.2924991962
Coefficient of variation (CV)1.902871321
Kurtosis2.432395301
Mean0.1537146485
Median Absolute Deviation (MAD)0.2137870147
Skewness1.942819941
Sum1375.746104
Variance0.0855557798
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 0.0416665 0.087121 0.0954545 0.1055555 ... 0.8257575 0.845238 0.8944445 0.9583335 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5903 66.0%
 
1 488 5.5%
 
0.083333 426 4.8%
 
0.166667 166 1.9%
 
0.25 156 1.7%
 
0.5 156 1.7%
 
0.090909 153 1.7%
 
0.333333 134 1.5%
 
0.1 94 1.1%
 
0.2 83 0.9%
 
Other values (37) 1191 13.3%
 
ValueCountFrequency (%) 
0 5903 66.0%
 
0.083333 426 4.8%
 
0.090909 153 1.7%
 
0.1 94 1.1%
 
0.111111 61 0.7%
 
ValueCountFrequency (%) 
1 488 5.5%
 
0.916667 77 0.9%
 
0.909091 19 0.2%
 
0.9 16 0.2%
 
0.888889 12 0.1%
 

TENURE
Real number (ℝ≥0)

Distinct count7
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.51731844
Minimum6
Maximum12
Zeros0
Zeros (%)0.0%
Memory size70.0 KiB

Quantile statistics

Minimum6
5-th percentile8
Q112
median12
Q312
95-th percentile12
Maximum12
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.338330769
Coefficient of variation (CV)0.1162015947
Kurtosis7.694823186
Mean11.51731844
Median Absolute Deviation (MAD)0.8180239069
Skewness-2.943017288
Sum103080
Variance1.791129248
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 6. 6.5 9.5 10.5 11.5 12. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
12 7584 84.7%
 
11 365 4.1%
 
10 236 2.6%
 
6 204 2.3%
 
8 196 2.2%
 
7 190 2.1%
 
9 175 2.0%
 
ValueCountFrequency (%) 
6 204 2.3%
 
7 190 2.1%
 
8 196 2.2%
 
9 175 2.0%
 
10 236 2.6%
 
ValueCountFrequency (%) 
12 7584 84.7%
 
11 365 4.1%
 
10 236 2.6%
 
9 175 2.0%
 
8 196 2.2%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

CUST_IDBALANCEBALANCE_FREQUENCYPURCHASESONEOFF_PURCHASESINSTALLMENTS_PURCHASESCASH_ADVANCEPURCHASES_FREQUENCYONEOFF_PURCHASES_FREQUENCYPURCHASES_INSTALLMENTS_FREQUENCYCASH_ADVANCE_FREQUENCYCASH_ADVANCE_TRXPURCHASES_TRXCREDIT_LIMITPAYMENTSMINIMUM_PAYMENTSPRC_FULL_PAYMENTTENURE
0C1000140.9007490.81818295.400.0095.400.0000000.1666670.0000000.0833330.000000021000.0201.802084139.5097870.00000012
1C100023202.4674160.9090910.000.000.006442.9454830.0000000.0000000.0000000.250000407000.04103.0325971072.3402170.22222212
2C100032495.1488621.000000773.17773.170.000.0000001.0000001.0000000.0000000.0000000127500.0622.066742627.2847870.00000012
3C100041666.6705420.6363641499.001499.000.00205.7880170.0833330.0833330.0000000.083333117500.00.000000NaN0.00000012
4C10005817.7143351.00000016.0016.000.000.0000000.0833330.0833330.0000000.000000011200.0678.334763244.7912370.00000012
5C100061809.8287511.0000001333.280.001333.280.0000000.6666670.0000000.5833330.000000081800.01400.0577702407.2460350.00000012
6C10007627.2608061.0000007091.016402.63688.380.0000001.0000001.0000001.0000000.00000006413500.06354.314328198.0658941.00000012
7C100081823.6527431.000000436.200.00436.200.0000001.0000000.0000001.0000000.0000000122300.0679.065082532.0339900.00000012
8C100091014.9264731.000000861.49661.49200.000.0000000.3333330.0833330.2500000.000000057000.0688.278568311.9634090.00000012
9C10010152.2259750.5454551281.601281.600.000.0000000.1666670.1666670.0000000.0000000311000.01164.770591100.3022620.00000012

Last rows

CUST_IDBALANCEBALANCE_FREQUENCYPURCHASESONEOFF_PURCHASESINSTALLMENTS_PURCHASESCASH_ADVANCEPURCHASES_FREQUENCYONEOFF_PURCHASES_FREQUENCYPURCHASES_INSTALLMENTS_FREQUENCYCASH_ADVANCE_FREQUENCYCASH_ADVANCE_TRXPURCHASES_TRXCREDIT_LIMITPAYMENTSMINIMUM_PAYMENTSPRC_FULL_PAYMENTTENURE
8940C19181130.8385541.000000591.240.00591.240.0000001.0000000.0000000.8333330.000000061000.0475.52326282.7713201.006
8941C191825967.4752700.833333214.550.00214.558555.4093260.8333330.0000000.6666670.6666671359000.0966.202912861.9499060.006
8942C1918340.8297491.000000113.280.00113.280.0000001.0000000.0000000.8333330.000000061000.094.48882886.2831010.256
8943C191845.8717120.50000020.9020.900.000.0000000.1666670.1666670.0000000.00000001500.058.64488343.4737170.006
8944C19185193.5717220.8333331012.731012.730.000.0000000.3333330.3333330.0000000.000000024000.00.000000NaN0.006
8945C1918628.4935171.000000291.120.00291.120.0000001.0000000.0000000.8333330.000000061000.0325.59446248.8863650.506
8946C1918719.1832151.000000300.000.00300.000.0000001.0000000.0000000.8333330.000000061000.0275.861322NaN0.006
8947C1918823.3986730.833333144.400.00144.400.0000000.8333330.0000000.6666670.000000051000.081.27077582.4183690.256
8948C1918913.4575640.8333330.000.000.0036.5587780.0000000.0000000.0000000.16666720500.052.54995955.7556280.256
8949C19190372.7080750.6666671093.251093.250.00127.0400080.6666670.6666670.0000000.3333332231200.063.16540488.2889560.006